Source : Mark van der Laan and Sherri Rose. Targeted learning: causal inference for observational and experimental dataSpringer Series in Statistics, 2011 #Implementation
In R we create a function to generate the data with the input number of draws and the output the observed data (ObsData) plus the counterfactuals (Y1, Y0).
The observed data:
1. Y: Mortality
2. A: Binary treatment for emergency presentation at cancer diagnosis
3. W1: Gender (1 male; 0 female)
4. W2: Age at diagnosis (0 <65; 1 >=65)
4. W3: Cancer TNM classification (scale from 1 to 4)
5. W4 Comorbidities (scale from 1 to 5)
#install.packages("broom")
options(digits=3)
generateData <- function(n){
w1 <- rbinom(n, size=1, prob=0.5)
w2 <- rbinom(n, size=1, prob=0.65)
w3 <- runif(n, min=0, max=4)
w4 <- runif(n, min=0, max=5)
A <- rbinom(n, size=1, prob= plogis(-0.4 + 0.2*w2 + 0.15*w3 + 0.2*w4))
Y <- rbinom(n, size=1, prob= plogis(-1 + A -0.1*w1 + 0.3*w2 + 0.25*w3 + 0.2*w4))
# counterfactual
Y.1 <- rbinom(n, size=1, prob= plogis(-1 + 1 -0.1*w1 + 0.3*w2 + 0.25*w3 + 0.2*w4))
Y.0 <- rbinom(n, size=1, prob= plogis(-1 + 0 -0.1*w1 + 0.3*w2 + 0.25*w3 + 0.2*w4))
# return data.frame
data.frame(w1, w2, w3, w4, A, Y, Y.1, Y.0)
}
set.seed(7858)
ObsData <- generateData(n=1000)
True_Psi <- mean(ObsData$Y.1-ObsData$Y.0);True_Psi
[1] 0.241
Bias_Psi <- summary(lm(data=ObsData, Y~A));Bias_Psi$coef
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.543 0.0238 22.79 6.48e-93
A 0.210 0.0300 6.99 4.98e-12
A <- factor(ObsData$A, levels=0:1)
Y.1.A.1 <-mean(ObsData$Y)[A=1]
Y.1.A.1
[1] 0.676
Y.1.A.0 <-mean(ObsData$Y)[A=0]
Y.1.A.0
numeric(0)
Bias_Psi2 <- (Y.1.A.1)-(Y.1.A.0); Bias_Psi2
numeric(0)
# DT table = interactive
# install.packages("DT") # install DT first
library(DT)
datatable(head(ObsData, n = nrow(ObsData)), options = list(pageLength = 5))
Estimation of the initial probability of the outcome (Y) given the treatment (A) and the set of covariates (W), denoted as the \(Q_{0}\)(A,W). To estimate \(Q_{0}\)(A,W) we can use a standard logistic regression model:
\(logit[P(Y=1|A,**W**)]=\beta_{0}\,+\,\beta_{1}A\+|\beta_{2}^{T}W\).
Therefore, we can estimate the initial probability (as follows: . (1) The predicted probability can be estimated using the Super Learner library implemented in the R package “Super-Learner”6 to include any terms that are functions of A or W (e.g., polynomial terms of A and W, as well as the interaction terms of A and W, can be considered). Consequently, for each subject, the predicted probabilities for both potential outcomes and can be estimated by setting A = 0 and A = 1 for everyone respectively: and,.
Thank you for participating in this tutorial.
If you have updates or changes that you would like to make, please send me a pull request. Alternatively, if you have any questions, please e-mail me.
Miguel Angel Luque Fernandez
E-mail: miguel-angel.luque at lshtm.ac.uk
Twitter @WATZILEI
devtools::session_info()
Bühlmann P, Drineas P, Laan M van der, Kane M. (2016). Handbook of big data. CRC Press.
Greenland S, Robins JM. (1986). Identifiability, exchangeability, and epidemiological confounding. International journal of epidemiology 15: 413–419.
Gruber S, Laan M van der. (2011). Tmle: An r package for targeted maximum likelihood estimation. UC Berkeley Division of Biostatistics Working Paper Series.
Laan M van der, Rose S. (2011). Targeted learning: Causal inference for observational and experimental data. Springer Series in Statistics.